StrassenNets: Deep learning with a multiplication budget

نویسندگان

  • Michael Tschannen
  • Aran Khanna
  • Anima Anandkumar
چکیده

A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) are due to matrix multiplications, both in convolutional and fully connected layers. Matrix multiplications can be cast as 2-layer sum-product networks (SPNs) (arithmetic circuits), disentangling multiplications and additions. We leverage this observation for end-to-end learning of low-cost (in terms of multiplications) approximations of linear operations in DNN layers. Specifically, we propose to replace matrix multiplication operations by SPNs, with widths corresponding to the budget of multiplications we want to allocate to each layer, and learning the edges of the SPNs from data. Experiments on CIFAR-10 and ImageNet show that this method applied to ResNet yields significantly higher accuracy than existing methods for a given multiplication budget, or leads to the same or higher accuracy compared to existing methods while using significantly fewer multiplications. Furthermore, our approach allows fine-grained control of the tradeoff between arithmetic complexity and accuracy of DNN models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, i.e., it can learn to multiply 2× 2 matrices using only 7 multiplications instead of 8.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Invariance with Compact Transforms

The problem of building machine learning models that admit efficient representations and also capture an appropriate inductive bias for the domain has recently attracted significant interest. Existing work for compressing deep learning pipelines has explored classes of structured matrices that exhibit forms of shift-invariance akin to convolutions. We leverage the displacement rank framework to...

متن کامل

Deep Learning With Noise

Recent works have shown that, by allowing some inaccuracy when training deep neural networks, not only the training performance but also the accuracy of the model can be improved. Our work, taking those previous works as examples and guidance, tries to study the impact of introducing different types of noise in different components of training a deep neural network. We intend to experiment with...

متن کامل

Finite Sum Acceleration vs. Adaptive Learning Rates for the Training of Kernel Machines on a Budget

Training predictive models with stochastic gradient descent is widespread practice in machine learning. Recent advances improve on the basic technique in two ways: adaptive learning rates are widely used for deep learning, while acceleration techniques like stochastic average and variance reduced gradient descent can achieve a linear convergence rate. We investigate the utility of both types of...

متن کامل

The Relationship of Study and Learning approaches with Students’ Academic Achievement in Rafsanjan University of Medical Sciences

Introduction: Most experts consider learning approach as the fundamental basis of learning dividing it into two parts of deep learning approach and surface learning approach. This is an endeavor to investigate the relationship between learning and study approaches with academic achievement among students in Rafsanjan University of Medical Sciences. Methods: This descriptive cross-sectional stu...

متن کامل

Improving the Neural GPU Architecture for Algorithm Learning

Algorithm learning is a core problem in artificial intelligence with significant implications on automation level that can be achieved by machines. Recently deep learning methods are emerging for synthesizing an algorithm from its input-output examples, the most successful being the Neural GPU, capable of learning multiplication. We present several improvements to the Neural GPU that substantia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1712.03942  شماره 

صفحات  -

تاریخ انتشار 2017